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Abstract. 

The objective of this paper is to develop neural network based control design techniques which address the issue of perfor- 
mance/control effort trade-off. Additionally, the control design needs to address the important issue of achieving adequate 
performance in the presence of actuator nonlinearities such as position and rate limits. These issues are discussed using the 
example of aircraft flight control. Given a set of pilot input commands, a feedforward net is trained to control the vehicle 
within the constraints imposed by the actuators. This is achieved by minimizing an objective function which is a weighted 
sum of the tracking errors, control input rates and control input deflections. A trade-off between tracking performance and 
control smoothness is obtained by varying, adaptively, the weights of the objective function. The n euro controller performance 
is evaluated in the presence of actuator dynamics using a simulation of the vehicle. Appropriate selection of the different 
weights in the objective function resulted in good tracking of the pilot commands and smooth neurocontrol. An extension of 
the neurocontroller design approach is proposed to enhance its practicality. 

I. Introduction. This paper addresses architecture and training issues in using neural computation 
towards practical control design. Such issues are discussed on a neural net architecture designed to control a 
simplified model of airframe/propulsion-system. Although many papers on application of neural networks to 
control design have appeared in the recent literature, e.g. Refs. [1-3], most of the applications considered are 
either for robotic systems or for control problems that are mainly of academic interest such as the inverted 
pendulum problem. The objective of this paper is to investigate the applicability of neural networks to the 
control of aerospace vehicles. 

In the current literature on neurocontrol design for tracking target trajectories, a great emphasis is placed 
on minimizing tracking error without due emphasis on physical constraints on control inputs. For instance, in 
Ref. [4], a neurocontrol design is presented for a cart-pole system which nearly ” exactly” tracks the reference 
cart position; however no information is provided on the control effort (force applied to cart) required to 
achieve the indicated performance. In general, such ’’exact” tracking of reference commands can only be 
achieved by using very large control input and control rates. A realistic control design problem consists 
of achieving a practical performance/control trade-off, i.e. ’’best” possible performance within the physical 
constraints of the actuators. Clearly then a better understanding needs to be developed on how to achieve 
this desired performance/control trade-off within the framework of neurocontrol design. 

The approach taken in this paper is that of learning the neurocontrol by minimizing an objective function 
which is a weighted sum of tracking errors and control input commands and rates. The notion of weighting the 
control inputs in the objective function has previously been suggested by other researchers, see for example 
Refs. [2-3]. The process of adapting the weights of the objective function (not to be confused with the synaptic 
weights of the neural net), towards maximizing tracking performance within the physical limitation of the 
actuators while providing control smoothness, is described. Knowledge gained about the effect of each 
component of the objective function leads to extend the architecture towards more flexibility as needed for 
practical control design. 

II. Vehicle Model. The vehicle model is a linear system of the form: 

x — Ax -b Bu a) z ~ Cx\ (1) 

where the state vector x , defined in Ref. [5], consists of 5 airframe state variables ( aircraft body axis forward 
and vertical velocities, aircraft pitch rate, pitch angle and altitude ), and 4 propulsion system state variables 
(engine fan speed, core compressor speed, engine mixing plane pressure, and engine high pressure turbine 
blade temperature). 

l S. Garg and W. Merrill, NASA Lewis Research Center. 

*T. Troudet and D. Mattern, Sverdrup Technology, Inc., 2001 Aerospace Parkway, Brook Park, Ohio 44142. 


1 



The control input vector, u a , of interest is 

u a = [WF, STVf ; (2) 

where WF is the engine main burner fuel flow rate (#/hr), and STV is the aft nozzle thrust vectoring angle 
(deg). The vehicle outputs to be controlled are 

* = r v ; Qf , (3) 

where V is the aircraft velocity (ft/sec), and Q the pitch rate (deg/s). The system matrices A> B , and C are 
available in Ref. [5]. The open-loop vehicle system defined by (1) to (3) is unstable in pitch response, and is 
strongly coupled in the response of the controlled outputs z to control inputs u a . 

The control design objective is to design a control system to provide decoupled command tracking of 
velocity and pitch rate from pilot control inputs so as to make the overall closed-loop vehicle system acceptable 
for pilot controlled flight. 

For a given input command zsel = [VseLi Qsel] T selected by the pilot, the commanded variables 
z c — [V ct Qc ] 7 that are to be tracked by the aircraft are solutions of 


— A rn X-rn + B rn Z c ; £ ^ , Z c — 


( 4 ) 


where the matrices A m , B m and C m represent the desired dynamics of the plant for a pilot selected input 
command. The numerical values for and C m for this example are as listed in Ref.[5]. These pre- 

filtering matrices are based on military specifications for level I ("good”) flying qualities for piloted aircraft 
of the type being considered here (see Ref. [6] for example). 

The dynamics of the fuel flow actuator are approximated by a second order system with transfer function 


Gwf {$ ) 


10 50 
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( 5 ) 


and with a maximum fuel flow T rate \WF rnax \ = 10, 000#//ir (perturbation around the nominal value), and a 
rate limit \WF ma x \ — 20, 000#/7ir/s. The thrust vectoring actuator is approximated by a first order system 
with transfer function 

GSTV{S) = 15 + 1 5 (6) 

with a maximum thrust vector angle j<5TI^ mttx | = lOdeg (perturbation around the nominal value), and a 
rate limit \6TV max \ = 20 deg/s, 

III. Training Architecture. The training architecture is represented in Fig.l. For each pilot 
selected trajectory z$el (£)> a commanded trajectory z c (t) is generated from (4). Prior to training, the 
commanded variables z c (t) are discretized and scaled into z*(t k ) = Wc{^k) /V c ° , Qc(tk)/Qc°] T where V c ° and 
Q c ° are of the order of magnitude of the maximum values of V c (t) and Q c (t) respectively. If 2 *(tjfc +1 ) is 
the commanded scaled output of the desired dynamics at time fjt+i, the actual scaled output of the aircraft 
controlled by the neural network is obtained as follows. 

As shown in Fig.l, the two control inputs (u) are calculated by a two hidden-layer feedforward net 
that has eight input units (or four pairs of input units associated to the Q and V variables), and two 
neurons in the output layer. These pairs consist of the scaled output vector z*{tk)\ the tracking error e z (t k ) 
between the scaled vehicle output vector z*(t k ) and its desired scaled value at time t k + 1 ( i.e. e t (tk) = 
^c(^+i) - ^ a (f*)); discrete time-derivative of the tracking error, e z (t k ); and the time-average of the 
tracking error, l/i* f Q k e 2 (t)dt. The motivation behind using the combination of z*(tk) and e z (t k ) as inputs 
to the neurocontroller, instead of z*{t k ) and z* c {t k + 1 ), is to allow the network to reconstruct the command 
without direct feedforward of the command (which would lead to a higher bandwith controller). The role 
of the error rates input e z (t k ) is to provide the net with lead information, and the integral error feedback 
l/£fc/o fc e”*(£)dt is to provide zero steady state tracking error for step commanded inputs. In Fig.l, A denotes 
a time-delay of length St. Each neuron has the standard activation function: 


y — tanh(x ); 


( 7 ) 
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which limits its output y to the interval [-1, -f-1] for any input signal x. For a given set of weights of the 
neural network, the two output neurons yield the normalized control input vector 


u(t k ) 


WF 


STV 

6TV rn an 


( 8 ) 


The fuel flow rate and thrust vectoring angle calculated by the neural net are then applied to the vehicle 
model during a small time-interval St = £* + i - t k , and change the state vector from x(t k ) to z(£*. + 1 ). To 
minimize the complexity involved in this preliminary analysis, the actuator dynamics were not explicitly 
included in the training architecture of Fig. 1. However, as will be seen in the next section, their bandwidth 
limiting effect can be accounted for by training the net to minimize an objective function that includes 
tracking errors, control input deflections and control input rates 


J{tk+ 1 ) — 2 ( € z (^*+1 


P-*z(tk+i) + U T (t k ).X.u(t k ) + u T (t k ).n.u(t k ) ) 


(9) 


where e z (t k + i) is the error _between the scaled commanded vector z*(t k + i) and the scaled vehicle output 
z s {t k + 1 ). The matrices p, A and p are 2x2 diagonal matrices whose coefficients can be adapted so as to 
modify the characteristics of the neurocontroller, and allow it to accomodate for the physical limitations of 
the actuators when operating in closed-loop. It is proposed to use the backpropagation algorithm [7] to find 
the weights of the neural net that minimize the objective function (9) over the set of pilot input commands. 
In order to backpropagate (9), a neural net emulator (perceptron) was used in place of the vehicle model in 
the training architecture of Fig.l. 

The commanded trajectories used to train the net were generated as follows. The pilot selected pitch 
rate was a doublet centered at a time t c between 2.5s and 5s, with the characteristics: Qsel(1) — Qo for 
t < t c \ Qsel(1) — —Q o f°r 2 t c > t > t c ; Qsel { = 0 for t > 2 t c . Note that Qsel corresponds to pilot 
longitudinal stick deflection with units in inches. The pilot selected airframe velocity was a step function 
characterized by VselW = 0 for t < 0 and Vsel{^) — Vo for t > 0. The maximum intensities }Qo\ and |T^j 
of the selected input commands were bounded by Qmax = 0.5m and V max = 20/i/s. This maximum value of 
Qsel corresponds to a maximum pitch rate command of about 3 deg/sec. Random sets of input trajectories 
were generated from uniform distributions of Q 0 , t c and V 0 over [-<? max , Qmax], [2.5s, 5sj and [- V „ tax , V max \ 
respectively. The commanded variables Q c (t) and V c (t) were filtered from QsEL(t ) and V S EL{t) over a period 
of 12s with a time-step Si — 0.01s. With these characteristics of the selected input commands, the scaling 
factors of the commanded variables z*(t) = [F* c (i), Q* c {t)) T are V c ° — 20 ft/sec and Q c ° = 3 deg/sec. 

IV. Neurocontrol Performance. The evaluation architecture of the neurocontroller in closed- 
loop is shown in Figure 2. The neurocontroller was tested on step pitch rate input commands, different 
from the doublets used in training. The input commands chosen to illustrate the neurocontrol performance 
were defined by the step pitch rate command QsEL{t) = 0.5inches (unit of the pilot shift stick) for t < 3sec, 
Qsel{ 0 = 0 for t > 3sec; applied simultaneously with one of the following classes of step velocity commands: 
VsEL{t > 0) = — 20ft/sec (case 1); VsEL(t > 0) = 20ft/sec (case 2); Vsel{^ > 0) = Oft/sec (case 3). 
Neurocontrol was applied to the vehicle-actuator system over a period of 12sec with a time-step Sr = O.OOlsec. 
For brevity, only the results of the evaluation with case 1 commands are presented in this paper. 

When training with A = 0, the neural net learns only to minimize the tracking error e z (t k ) with- 

out giving any consideration to the cost associated with high control requirements and high control rates. 
Training was performed over a set of 2000 commanded trajectories with a network configuration of 15 
neurons in the first hidden layer, and 20 neurons in the second hidden layer. The synaptic weights were 
updated at every time t k (= kSt = h x O.Olsec) after backpropagating J(£*;)through the network. This 
was done once for each trajectory of the training set with a steepest-descent coefficient a ~ 0.03. For 
p = diag[pv , pq\ = diap[2000, 20], the net learns to track the commanded outputs nearly perfectly in the 
absence of actuators (see Fig. 3a), but with extremely high control input and control rate requirements (see 
Fig. 3b). When the actuator dynamics are included in the closed-loop evaluation, the tracking performance 
deteriorates significantly, as shown in Fig. 4, with highly oscillatory pitch rate response and a limit cycle 
behavior in velocity/fuel-flow response. 

The limit cycle in the velocity response is due to the large commanded fuel flow from the neurocontroller. 
In order to constrain control commands during training, non zero values of the control weights A were chosen 
in the objective function to be minimized (see (9)). A study of the tracking-performance/control-requirement 
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trade-off was conducted by training the network with A of the form A — A I, where J is the 2x2 identity matrix 
and A, a scalar, was varied from 0.01 to 0.1, with the same training characteristics and the same matrix 
elements of p as before. The results from this trade-off study are shown in Fig. 5 in terms of mean-square 
velocity tracking error, ey, plotted against the mean-square fuel flow rate for the trained network tracking 
the commands of case 1, with actuators included in the evaluation. As seen from Fig. 5, weighting the 
control commands in the cost error function (9) by the small value A = 0.01 results in a significant decrease 
in commanded control activity while simultaneously improving tracking performance. Increasing A beyond 
0.01 does not result in any further significant decrease in control activity while the tracking performance 
starts to degrade noticeably. 

The results of the closed-loop neurocontrol with A = 0.01 are shown in Fig. 6. Clearly, constraining the 
control activity during training results in a neurocontroller with stable and much improved velocity tracking 
capability, with zero steady state errors for both pitch rate and velocity, and low fuel flow requirements. 
However, the pitch rate response is still oscillatory, and it is evident from the the thrust vectoring ( STV ) 
requirement plot that the command generated by the neurocontroller is riding the actuator rate limit. Such 
ringing of the actuator would lead to premature actuator wear and would also result in undesirable aircraft 
response. A smooth evolution of the control inputs is therefore a practical requirement of control design. 

In order to enhance the control smoothness within the architecture of Fig. 1, the control input rates were 
constrained during training by choosing a non-zero p, in the cost error function to be minimized during 
training. A performance/control-rate trade-off similar to that for the control activity constraint was also 
performed for the control rate constraint. Training was performed in two phases for a network configuration 
of 15 neurons in the first hidden layer, and 10 neurons in the second hidden layer. In the gross-tuning 
phase of the training, a set of 100 commanded trajectories was randomly generated, and the synaptic 
weights were updated following a moving-window scheme: at every time tjt, the weights were incremented 
after backpropa^ating through the network the time- integral of the objective function calculated over one 
second, i.e. This was done once for each trajectory of the training set with a steepest-descent 

coefficient a = 0.0001. In the fine-tuning phase of the training, the objective function was sampled every 0.1 
sec. over a single commanded trajectory during a 12 sec. period. The changes in the synaptic weights were 
calculated for each sample, but the weights were updated only after summing the calculated changes over all 
the samples of the training trajectory. This procedure was repeated 50 times with a = 0.0002. The results 
with jl = 0.017 and p, A as defined above, are shown in Fig. 7. The pitch rate response no longer exhibits 
oscillatory behavior, and the deviation from the ideal response is small. The velocity command tracking is 
just as good as for the previously discussed case (Fig. 6), while the control requirements WF and STV are 
both much more smooth due to the control rate weighting in the performance index. 

V. Conclusion. Insight was provided into issues related to practical control design using neural 
networks. An aircraft control design example with two control inputs and two controlled outputs was 
used to illustrate these issues. The control design problem was set up as that of following the trajectories 
generated from a model of the desired vehicle response dynamics from pilot command inputs. The training 
of the network was done without any actuators, to simplify the computation, while resulting neurocontrollers 
were evaluated with the non-linear actuator dynamics including position and rate limits. In most previous 
studies of neurocontrol design for such tracking problems, network training was done with emphasis only 
on minimizing the tracking error. It was shown in this paper that the neurocontrol learned with such a 
training results in very high control and control rate commands. Although nearly perfect tracking of pilot 
commands was achieved with such a control law when the control input was unconstrained, the closed-loop 
system was found to be unstable when the actuator dynamics and position and rate limits were included 
in the evaluation. Performance/control effort trade-off is an important consideration in practical control 
design, and an approach for achieving this trade-off within the framework of neural network based control 
was suggested and investigated in this paper. In this approach, neurocontrol is learned by minimizing 
an objective function which is a weighted sum of tracking errors, control inputs and control input rates. 
Appropriate selection of the different weights in the objective function resulted in good tracking of the pilot 
commands and smooth neurocontrol. The possibility of achieving further improvement in performance by 
including the actuator constraints during training is currently being investigated. 
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Figure 1.— Training architecture. 



Figure 2. — Evaluation architecture of dotecUoop neurocontroller. 
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Figure 3b. Closed-Loop Neurocontroller without Actuators; case 1 with A = Ji = 0. 
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